-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: deflake TestFollowerReadsWithStaleDescriptor #62833
kvserver: deflake TestFollowerReadsWithStaleDescriptor #62833
Conversation
cbf6d8d
to
5afab80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go, line 506 at r1 (raw file):
var n2Addr, n3Addr syncutil.AtomicString waitForSnapApplication := make(chan struct{})
nit: name it snapApplication
(or better yet snapApplied
); that's what the channel carries. The fact that one of the sides uses it to "wait" for the signal is generally implied by channels.
pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go, line 519 at r1 (raw file):
// `waitForSnapApplication` to signal that. AfterApplySnapshot: func(inSnap *kvserver.IncomingSnapshot) { if inSnap.SnapType == kvserver.SnapshotRequest_VIA_SNAPSHOT_QUEUE {
should we check for the right node id and range id before signaling?
pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go, line 601 at r1 (raw file):
// do. select { case <-waitForSnapApplication:
instead if this new knob, could we just poll the store on n3 until it gets the desired replica? I would probably prefer that, since waiting for a particular snapshot seems only indirectly related to what we really want - which is n3 to get the replica. And you also had to add a lot of words about the "snapshot queue", which this test doesn't feel like it need to know about.
pkg/kv/kvserver/testing_knobs.go, line 195 at r1 (raw file):
// AfterApplySnapshot is run after a snapshot is successfully applied by its // recipient. AfterApplySnapshot func(*IncomingSnapshot)
Does the arg need to be a pointer? We seem to pass that struct by value elsewhere.
1f5d57e
to
6fb9e96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go, line 601 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
instead if this new knob, could we just poll the store on n3 until it gets the desired replica? I would probably prefer that, since waiting for a particular snapshot seems only indirectly related to what we really want - which is n3 to get the replica. And you also had to add a lot of words about the "snapshot queue", which this test doesn't feel like it need to know about.
Changed it to just poll n3 for the non-voter, see now.
A preceeding change (cockroachdb#62696) introduced a flakey update to this test. Prior to that change, this test was using 2 voting replicas but that change tried to make it use 1 voter and 1 non-voter instead (as a litmus test for the new syntax added in cockroachdb#62696). The test rebalances a replica away from a node and ensures that a historical read sent immediately afterwards gets re-routed to the leaseholder replica, since the receiving store had its replica destroyed. However, when we're using a non-voter in this test, that non-voter may not have learned about this replication change by the time it receives this historical query and that fails the assertion. This commit re-organizes the test and fixes the flake. Release note: None
6fb9e96
to
04c09fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
thanks for the review! bors r+ |
Build succeeded: |
hey @aayushshah15 did this fix #54494? The last failure there was on the same day that this merged. But before that one failure, the others are way older and they used to happen every day - which seems weird. |
A preceding change (#62696) introduced a flakey update to this test.
Prior to that change, this test was using 2 voting replicas but that
change tried to make it use 1 voter and 1 non-voter instead (as a litmus
test for the new syntax added in #62696).
The test rebalances a replica away from a node and ensures that a
historical read sent immediately afterwards gets re-routed to the
leaseholder replica, since the receiving store had its replica
destroyed. However, when we're using a non-voter in this test, that
non-voter may not have learned about this replication change by the time
it receives this historical query and that fails the assertion.
This commit re-organizes the test and fixes the flake.
Release note: None